A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters

نویسندگان

  • Dejiang Jin
  • Sotirios G. Ziavras
چکیده

The multiplication of large spare matrices is a basic operation in many scientific and engineering applications. There exist some high-performance library routines for this operation. They are often optimized based on the target architecture. For a parallel environment, it is essential to partition the entire operation into well balanced tasks and assign them to individual processing elements. Most of the existing techniques partition the given matrices based on some kind of workload estimation. For irregular sparse matrices on PC clusters, however, the workloads may not be well estimated in advance. Any approach other than run-time dynamic partitioning may degrade performance. In this paper, we apply our super-programming approach [24] to parallel large matrix multiplication on PC clusters. In our approach, tasks are partitioned into super-instructions that are dynamically assigned to member computer nodes. Thus, the load balancing logic is separated from the computing logic; the former is taken over by the runtime environment. Our super-programming approach facilitates ease of program development and targets high efficiency in dynamic load balancing. Workloads can be balanced effectively and the optimization overhead is small. The results prove the viability of our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelizing Sparse Grid Volume Visualization with Implicit Preview and Load Balancing

New algorithms that work entirely on sparse grids can create data sets that cannot be handled on uniform grids any more due to their size. On the other hand, most visualization techniques are only capable of handling uniform grids. As the interpolation on sparse grids is a complicated and time consuming process, direct volume visualization is unthinkable for bigger data sets until the underlyin...

متن کامل

Blocked-based sparse matrix-vector multiplication on distributed memory parallel computers

The present paper discusses the implementations of sparse matrix-vector products, which are crucial for high performance solutions of large-scale linear equations, on a PC-Cluster. Three storage formats for sparse matrices compressed row storage, block compressed row storage and sparse block compressed row storage are evaluated. Although using BCRS format reduces the execution time but the impr...

متن کامل

Optimized Communication Patterns on Workstation Clusters Optimized Communication Patterns on Workstation Clusters

The limited communication bandwidth and high startup latencies of clustered workstations restrict their use to problems with sparse communication patterns or good concurrency between calculation and communication. First we describe our modiications to the popular PVMM5] message passing library, and on performance improvements using the PVM package on an FDDI-ring. Applications developed with a ...

متن کامل

Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

We present a library for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. By using a quadtree matrix representation data locality is exploited without any prior information about the matrix sparsity pattern. A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Paral...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 87-D  شماره 

صفحات  -

تاریخ انتشار 2004